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Abstract 

We propose a novel distance to calculate distance be¬ 
tween high dimensional vector pairs, utilizing vector 
quantization generated encodings. Vector quantization 
based methods are successful in handling large scale 
high dimensional data. These methods compress vec¬ 
tors into short encodings, and allow efficient distance 
computation between an uncompressed vector and com¬ 
pressed dataset without decompressing explicitly. How¬ 
ever for large datasets, these distance computing meth¬ 
ods perform excessive computations. We avoid exces¬ 
sive computations by storing the encodings on an En¬ 
coding Tree(E-Tree), interestingly the memory con¬ 
sumption is also lowered. We also propose Encoding 
Forest(E-Forest) to further lower the computation cost. 
E-Tree and E-Forest is compatible with various existing 
quantization-based methods. We show by experiments 
our methods speed-up distance computing for high di¬ 
mensional data drastically, and various existing algo¬ 
rithms can benefit from our methods. 

Introduction 

The rapid development of the Internet in the recent years 
brings explosive growth of information online. Researchers 
have been developing methods utilizing such huge amount 
of data for machine learning, information retrieval, com¬ 
puter vision, etc. Because the majority of large-scale 
datasets consists of high-dimensional data, there is an in¬ 
creasing requirement for efficient basic operations like eval¬ 
uating distance and computing scalar product. 

Product Quantization (PQ)(Jegou, Douze, and Schmid 
2011) is a typical method for fast distance computation/s¬ 
calar product on high-dimensional data. PQ compress high¬ 
dimensional data into short encodings, and is able to eval¬ 
uate distances or scalar product between uncompressed and 
compressed vectors without explicit decompression. Given a 
d-dimensional dataset, PQ compress a dataset by first split¬ 
ting the vector dimensions into M groups, then quantize 
each dimension group separately to generate M codebooks 
containing K codewords (each codeword has d/M dimen¬ 
sions). Finally we pick one codeword form each codebook 
to encode an input vector. The compressed vector has M 
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parts, each part occupies log 2 K bits. An encoded vector 
is approximated (decompressed) by the concatenation of M 
codewords assigned. 

Computing distances between N pairs of PQ compressed 
vectors and an uncompressed vector x can be efficiently 
done in 0{MN) time, via a smart use of lookup tables. It 
is introduced as Asymmetric Distance Computing (ADC) in 
(Jegou, Douze, and Schmid 2011). One can easily extend the 
idea to allow efficient scalar product computation(Du and 
Wang 2014), etc. PQ enables efficient Approximate Near¬ 
est Neighbor search, where PQ achieves favorable mem¬ 
ory / speed vs accuracy trade-offs against several compet¬ 
itive methods including Hashing based schemes and Tree 
based schemes(Ge et al. 2013), (Norouzi and Fleet 2013). 
Researchers also developed various quantization methods 
motivated by Product Quantization, e.g. Tree Quantiza- 
tion(Babenko and Lempitsky 2015), Composite Quantiza- 
tion(Ting Zhang 2014), Cartesian K-means(Norouzi and 
Fleet 2013), Additive Quantization(Babenko and Lempitsky 
2014), etc, to further lower the quantization error. 

Existing problem: Though ADC is efficient compared 
to directly computing the distances, it still does excessive 
computations. Existing vector quantization methods simply 
store the encodings sequentially in the memory, and exhaus¬ 
tively perform ADC to compute the approximate distance. 
However in any quantized dataset, many encodings share the 
same prefixes. These prefixes are repeatedly computed with 
ADC, they also take up excessive memory. 

Our contribution: In this paper, we propose Encod¬ 
ing Tree(E-Tree) to lower the memory consumption and 
speedup the distance computation for encodings generated 
with vector quantization methods. An E-Tree is a compact 
version of prefix tree with the nodes having only one leaf 
child recursively merged. 

We propose Hierarchical Memory Structure for Encod¬ 
ing Tree which is designed for efficient depth first traver¬ 
sal and allow accelerated distance computation. To perform 
accelerated distance computation, we maintain a very short 
’’partial” ADC results, and depth-first traverse the tree. The 
accelerated distance computation is cache friendly and eas¬ 
ily paralleled as it sequentially access the memory. Inter¬ 
estingly, with Hierarchical Memory Structure, we’re able to 
speed up distance computation as well as lower the mem¬ 
ory consumption. Eor further speed up one can generate an 



Encoding Forest by generating multiple E-Trees on different 
parts of the encodings, at a slight cost of memory consump¬ 
tion. 

As a method for fast distance computation, E-Tree/E- 
Forest are totally compatible with various existing quanti¬ 
zation methods by simply substitute ADC with E-Tree/E- 
Forest for distance computation. E-Forest achieves up to 
111 . 7 % speedup compared to the naive ADC, and E-Tree 
lower the memory consumption by 12 . 5 %. E-Tree/E-Forest 
can accelerate various related algorithms significantly, e.g. 
Locally Optimized Product Quantization by 74 %, and IV- 
FADC by 81 %. Applications relying on efficient distance 
computation could greatly benefit from our methods. 

Related Work 

Vector Quantization is commonly applied on high¬ 
dimensional data for efficiently manipulating the data like 
computing distances between vectors. It essentially maps a 
vector to a codeword, and use the codeword to approximate 
the original vector. Take Product Quantization as an exam¬ 
ple, it first decompose the original data space as the Carte¬ 
sian Product of M disjoint lower dimensional subspaces, 
and learn M codebooks = {cm(l)j • • • , Cm{K)}, m = 
1, • • • ,M for each subspace. Then we encode a vector x 
with Cm on the corresponding dimensions to produce an 
M-encoding; x —>■ ii(x),i 2 (x), • • • Padding the 

codewords with zero chunks to obtain full dimensional code¬ 
words, vector X can be reconstructed as x « Ci(ii(x)) -b 
C2(j2(x)) H-h Cm(*m(x)). 

We can perform Asymmetric Distance Computa- 
tion(ADC) introduced in (Jegou, Douze, and Schmid 2011) 
to compute the distance between a vector and quantized 
vectors. The Euclidean distance between a vector q and a 
database vector x is approximated by: 

M 

||q-xf « ||q- c™(i™(x))f 

m—1 

M 

771=1 

M M 

+ Y 'll Ci(f,(x))^Cj(ij(x)) 

i=l j = 

ADC allows fast massive distance computation: The first 
term is computed only once for all vectors before the dis¬ 
tance computation and is stored in the a precomputed dis¬ 
tance table, the second term is a constant for all database 
vectors which can be omitted, and the third term is zero 
for PQ learned codebooks. Thus, the approximate distance 
between q and a database vector x can be efficiently com¬ 
puted with M table lookups and M — 1 addition. One can 
also easily extend ADC to perform efficient scalar prod- 
uct(Du and Wang 2014), or compute scalar product on ker¬ 
nel space(Davis, Balzer, and Soatto 2014). 

Researchers also developed various similar quantization 
based methods to further lower the quantization error and 



Figure 2: An illustrative example of Encoding Tree. Note 
on this tiny example, we need 25% less memory access 
and 13% less floating point additions to compute the dis¬ 
tance compared to ADC implementations. The acceleration 
is more apparent on large scale datasets. 


allow preciser distance computation. Optimized Product 
Quantization(Ge et al. 2013) proposed to rotate the data 
space for better subspace partition. Additive Quantization 
doesn’t decompose data space into orthogonal subspaces, 
instead it uses all dimensions to generate the codebooks. 
While it makes the third term in equation 1 to be non¬ 
zero, requiring additional information to be stored along 
with the encoded dataset. Other methods include Tree Quan- 
tization(Babenko and Lempitsky 2015), Composite Quanti- 
zation(Ting Zhang 2014), etc, they all allow fast distance 
computation in an ADC-like fashion. 

Though ADC is much faster compared to brute force dis¬ 
tance computation, however, it still makes up the majority 
consumed time in applications like IVFADC(Jegou, Douze, 
and Schmid 2011), in large scale SVM training(Harchaoui 
et al. 2012) (Lebrun, Charrier, and Cardot 2004) and other 
applications involving large scale data. For a large scale 
database contain millions of encoded vectors, many encod¬ 
ings have the same prefixes, while these prefixes are repeat¬ 
edly calculated in ADC. Thus a solution is to generate a pre¬ 
fix tree-like structure to discover and avoid excessive com¬ 
putation. Interestingly we found such tree also lowers mem¬ 
ory consumption. We propose Encoding Tree to accelerate 
distance computation. * 

Encoding Tree 

An Encoding Tree is a variant of prefix tree. Prefix tree is a 
standard method for searching and storing strings in scale. 
However prefix tree has not been introduced to handle large 
scale high-dimensional data occurred to machine learning, 
computer vision, etc, to our knowledge. By generating the 
encoded vectors, one can effectively store encodings of a 
dataset in a prefix tree, in which all the descendants of a 
node have a common prefix of the encoding associated with 
that node. An illustrative tree structure example is presented 
in Figure 2. In a prefix tree the common prefix only ap¬ 
pears once, and the memory it consumes could be saved; 
it also implies that we don’t need to calculate the ’’partial” 

*It can also easily extended to compute inner product or inner 
product in kernel space. We omit the discussion due to the length 
limit of the paper. 
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Figure 1: Hierarchical Memory Structure layout, the nodes of the encoding tree is stored in depth first traversal sequence, 
yielding predictable memory access. An internal node takes 2 Bytes; and a leaf node takes P + 2 + n' Bytes, where P denotes 
the postfix length and n' denotes the number of associated vectors. 


ADC on this prefix twice. Furthermore, in order to achieve 
higher speed, accuracy and lower memory consumption, if 
one node has only a single leaf descendant, the path from the 
node to the leaf is compressed into one leaf node. 

Constructing Encoding Tree 


Algorithm 1 Construction of Encoding Tree 
Input: N encoded vectors Pi, • • • , P/v, each of which has 
length of M, in lexicographic order 
Output: Encoding Tree 
1: root ^ Pi 
2: lastPath := root 
3: for each Pi do 

4: I := Longest Common Prefix(lastPath, P) 

5: Merge lastPathj^ + 1 ••• M} 

6: Create nodes; lastPathjZ} ^ P{1 + 1} ^ P{1 + 

2} • • • ^ P{M} 

7: lastPath{; + 1 ~ M}:=P{1 + 1 ~ M} 

8: associate i-th vector to lastPathjM} 

9: end for 
10: return root 


To construct the Encoding tree, a straightforward solu¬ 
tion is to directly adopt an existing implementation of prefix 
tree library and compress the tree to achieve minimal mem¬ 
ory consumption. However, these implementations of gen¬ 
eral purpose prefix tree are still too massive for encoding 
tree with excessive dynamic arrays and pointers. Memory 
consumption is a critical problem in our algorithm, because 
we have to store all the encodings in memory. In addition, 
dynamic arrays and pointers are not friendly to extensive 
computation. Therefore, a memory efficient and computa¬ 
tion friendly approach to maintain the encoding tree is in 
urgent need. 

If the encodings are in order, we can efficiently generate 
the Encoding Tree without the use of dynamic array and ex¬ 
cessive pointers. An in place sort of the encoded dataset can 
be done efficiently with existing libraries. Then we adopt 
the algorithm presented in Algorithm 1 to generate the En¬ 
coding Tree. We first allocate enough memory for the tree. 


then the tree can simply grow linearly without memory frag¬ 
ments. The time complexity of generating the tree alone is 
0{MN) for an dataset containing N encodings with M 
chunks. Taking in-place sort of the encoded dataset into 
consideration, the total preparation time is 0{MNlogN). 
The calculation time for constructing the encoding tree is 
much smaller than the time costly encoding phase, which is 
0{dKN) for encoding with PQ, or 0{dKN + (P) for OPQ. 

Hierarchical Memory Structure 

We propose Hierarchical Memory Structure for the above 
algorithm, and present the corresponding accelerated asym¬ 
metric distance computation methods in this section. An il¬ 
lustration of the Hierarchical Memory Structure is presented 
in Eigure 1. The Hierarchical Memory Structure store nodes 
in the depth-first traversal sequence and is thus actually 
’’flat” to allow parallelism and predictable memory access, 
which is crucial in practice for High Performance Comput- 
ing(HPC). We briefly introduce the role of each field; 

• Eield K is for storing a single encoding chunk. 

• Eield Union is a branch indicator and stores meta-data of 
a node. The least significant bit indicates if this node is 
a leaf node or an internal node. Eor a leaf node, the rest 
of the bits are used for storing associated vectors number; 
and for an internal node, they are used to store depth of a 
internal node. 

• Eield idx indicates the associated vectors’ IDs of a node. 
This field only occur when the node is a leaf node. 

As a depth first traversal sequence. Hierarchical Memory 
Structure can be separately constructed in different memory 
chunks, and concatenate them to form the whole structure. 
Thus Algorithm 1 can be effectively accelerated with various 
parallelism methods. The memory consumption of Hierar¬ 
chical Memory Structure depends on the number of internal 
nodes Li and the number of leaf nodes L 2 existing on the en¬ 
coding tree. An internal node requires 2 Bytes; while a leaf 
node takes up 2 -f P Bytes, denoting P as the average leaf 
node postfix length . Eor each dataset vector there is a corre¬ 
sponding ID stored in a leaf node, taking AN Bytes in total. 
The total memory consumption is AN A- 2(Pi + L 2 ) + PP 2 
Bytes. 
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Figure 3: To perform distance computation on Hierarchical Memory Structure, one can sequentially access the memory to depth 
first traverse the tree, and perform ’’partial” ADC on the current node and store the result on Distance Context Table to avoid 
excessive computations. 


Distance Computation with Hierarchical Memory 
Structure 

On the distance computation phase, we depth-first traverse 
the tree(sequentially read the memory in essence), and per¬ 
form a ’’partial” ADC on every node. We present a pseudo 
code in C fashion to elaborate the distance computation: 


float DistanceContext[M] 

Node* pointer=&root 
in t currentLayer=0 

do{ 

distance=DistanceContext [ currentLayer ] + 
Precomputed[pointer—>K] 
if (pointer—>isLeaf){ 

int PostfixLength=M—currentLayer 
for (int i=0; i<PostfixLength; ++i) 
distance+=Precomputed[pointer— 
PostFix[i]] 

OutputResult 0 

pointer+=2+Postf ixLength+pointer— 
AssociatedVectorCount 

} 

else { 

currentLayer=pointer—>LayerNumber 
DistanceContext [currentLayer] = 'e^ 

DistanceContext[currentLayer — l]+e^ 
Precomputed[pointer—>K] 
pointer+=2 

} 

} 

We maintain a short Distance Context to store currently 
performed ’’partial” ADC result. The Distance Context is 
updated whenever we visit an internal node. We output the 
computation result to a preallocated array for collecting the 
distance. We illustrate the procedure of distance computa¬ 
tion in Figure 3 Note the construction of encoding tree al¬ 
ways merge common prefixes, thus every time we update 



Figure 4: The access frequency of different part of Precom¬ 
puted Distance Table. We used Product Quantization, Op¬ 
timized Product Quantization and Additive Quantization to 
produce encoded vectors with K — 256, M = 8, and gener¬ 
ate the corresponding Encoding Tree. We perform a traverse 
of the tree to obtain the access frequency. 

the Distance Context, we’re avoid excessive computation. 
The final calculation time depends on the number of nodes 
N' existing on the encoding tree, and the average leaf node 
postfix length P^. Distance computation with Hierarchical 
Memory Structure require total 0{N' -f P) computations. 

Distance computing with Hierarchical Memory Structure 
requires very few memory. The hot spot data is the Distance 
Context, which can be efficiently stored in the register since 
the Distance Context only occupy a few Bytes. For usual 
ADC implementation. Precomputed distance table may be 
too big to ht into a higher level cache, while in Hierarchi¬ 
cal Memory Structure, only part of the table is frequently 
accessed so our method is more cache friendly. We present 
the access frequency of each part of the Precomputed dis¬ 
tance table in Figure 4. As a sequential memory access- 

^Ot M — Di, where Di denotes average leaf node depth. 



















































































































(a) Number of leaf nodes on the E-Tree/E-forest, by using encoded 
vectors obtained with different vector quantization methods. 



(b) Number of internal nodes on the E-Tree/E-forest. 



(c) Average postfix length of the leaf nodes for E-Tree/E-forest. 


Figure 5: The statistics of Encoding Tree. E-Tree NO(resp. 
RS) refers to original orderings(resp. randomized orderings) 
with one E-Tree, 2xE-Tree refers to E-Eorest with 2 E-Trees. 


ing method, Hierarchical Memory Model doesn’t confuse 
the prefetch system on CPU/GPU. It also allow SIMD in¬ 
structions available on modern CPU/GPU to further boost 
the performance. 

To sum up. Hierarchical Memory Structure is cache 
friendly as well as lowers the total computation requirement. 


Encoding Forests 

We can further lower the calculation time with multiple En¬ 
coding Trees. An Encoding Eorest is generated by splitting 
the encodings in to several parts and build Encoding Trees 
separately. However an Encoding Tree record every IDs of 
the dataset vectors on the leaf nodes, an Encoding Eorest will 
have to record the vectors IDs for multiple times, resulting 
in more memory consumption. 

To perform distance calculation with an Encoding Eor¬ 
est consists of multiple Encoding Trees, we first calculate 
the ’’partial” distance with the two encoding tree and output 
the result into different arrays. Then the distance is obtained 
by the summation these result arrays. Note performing sum¬ 
mation of the resulting arrays is time consuming, thus it’s 
not recommended to construct an Encoding Eorest with too 
many Encoding Trees. As observed in Eigure 5(c), we rec¬ 
ommend constructing a Encoding Tree with at least 4 codes 
for speed consideration. 

In our implementations, we generate 2 Encoding Trees 
for a balanced trade-off between memory consumption and 
calculation time. 

Experiments and Discussions 

To examine the acceleration with our method, we generated 
encoded vectors of SIETIM dataset with Product Quanti- 
zation(Jegou, Douze, and Schmid 2011), Optimized Prod¬ 
uct Quantization(Ge et al. 2013) and Additive Quantiza- 
tion(Babenko and Tempitsky 2012). Eor Additive Quantiza¬ 
tion we quantized on the extra information to 1 Byte as pro¬ 
posed in (Babenko and Lempitsky 2012) and store the extra 
information on the leaf node. Eor all methods, we produce 
M = 8/16, AT = 256 encodings. The vectors’ ID is also 
stored along with the encoded vectors. We used an Core i7 
running at 3.6Ghz with 16G memory to perform the experi¬ 
ments. 

Statistics of Encoding Tree 

There are three things we’re interested about the Encoding 
Tree; 

• Number of internal nodes on the E-Tree. Every time we 
visit and update the distance context table, we’re avoiding 
at least one excessive computation on the dataset (an in¬ 
ternal node has at least two children or it is merged to a 
leaf node) 

• Total number of leaf nodes on the E-Tree. 

• The average postfix length of the leaf nodes. The postfixes 
have to be computed for all leaf nodes and is most time 
consuming. 

The statistics is shown in Eigure 5. It can be observed that 
the number of leaf nodes is almost equal to the the dataset 
length for a single Encoding Tree, this is because vectors are 
not likely to be encoded into a same encoding. Nevertheless, 
the postfix length is much smaller than encoding length M, 
so we can still gain a significant acceleration. We also ob¬ 
served the internal nodes are very few compared to the leaf 
nodes. To conclude, most of the time spending on distance 
computation would be on the postfix computing. 































































































































Figure 6: The performance of accelerated distance compu¬ 
tation with Encoding Tree. We include the performance of 
Asymmetric Distance Computation for reference. 


Encoding Ordering 

Obviously, the acceleration of distance computation and 
compression rate of the encoded dataset with Encoding Tree 
is highly dependent on the length of common prefixes. If 
encoded vectors have longer common prefixes, i.e, a deeper 
depth of the leaf node, our proposed encoding tree can per¬ 
form better. The ordering of encoding chunks may have an 
influence on the final tree size and therefore the speed of dis¬ 
tance computation. We compared the following ordering of 
the encodings: 

1. Original ordering. We generate encoding tree directly ac¬ 
cording to the original encodings. 

2. Randomized ordering. We first shuffle the encoding 
chunks, then generate the encoding tree. 

We adopt different encoding arrangement and generate the 
corresponding E-Tree. As depicted in statistics Eigure 5 and 
performance Eigure 6. We found the encoding orderings 
have relatively small impact on the number of postfix/pre¬ 
fix length. 



IVEADC 

LOPQ 

ADC Time 

(74ms) 65ms 

69ms 

Time with E-Tree 

55ms 

59ms 

Time with E-Eorest 

42ms 

47 ms 

ADC Memory 

8.01G 

8.52G 

Memory with E-Tree 

7.17G 

7.59G 

Memory with E-Eorest 

8.82G 

9.30G 


Table 1: Applying E-Tree/E-Eorest on IVEADC and LOPQ 
with configuration w = 64, K' = 8192, K = 256, M = 8 
suggested in (Jegou et al. 2011). E-Tree brings significant 
improvement over the original algorithms. Number in brack¬ 
ets are reproduced from (Jegou et al. 2011). 


Performance 

Eigure 6 present the distance computing time and mem¬ 
ory consumption for E-Tree and E-Eorest. E-Tree/E- 
Eorest achieves maximum acceleration ratio on smaller 
encodings. On 8 bytes PQ encoding, the average dis¬ 
tance computing time with ADC is 2.678ms, while it 
takes only 1.265ms(l 11.7% speed-up) with E-Eorest or 
1.760ms(52.2% speed-up) with E-Tree. E-Tree also lowers 
the memory consumption by 12.5%. E-Eorest achieves very 
cost effective speed-up with 6.67 % more memory consump¬ 
tion. 

On longer encodings the postfix length is also increased. 
One may generate E-Eorest with more E-Trees on longer 
encodings to overcome this issue, at the cost of increased 
memory consumption. E-Tree and E-Eorest perform best on 
smaller encodings as the postfix is shorter. 

Application on related algorithms 

We experiment our methods on two simple utilization of 
Asymmetric Distance Computation, namely, IVEADC pro¬ 
posed in (Jegou, Douze, and Schmid 2011) and Locally Op¬ 
timized Product Quantization proposed in (Kalantidis and 
Avrithis 2014). We replace the ADC part with Encoding- 
Tree to boost the search speed. The speed-up is shown in 
Table 1. Similarly, one can apply E-Tree and E-Eorrest on 
any circumstance depending on fast approximate distance 
computation. One can also extend E-Tree to allow fast scalar 
product, etc, we leave it a future work. 

Conclusion 

E-Tree/E-Eorest provide significant speed-up and lowers 
memory consumption by generating a tree to avoid excessive 
computation. The memory consumption can be also low¬ 
ered. E-Tree and E-Eorest are compatible to current existing 
algorithms relying on ADC and can bring significant speed¬ 
up. In this paper we found the length of postfix is the major 
limitation of Encoding Tree, how to reduce the length the 
length of postfix or increase the length of prefix is leave to 
be the future work. 
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